Understand the role of random variables and common statistical distributions in formulating modern statistical regression models.
Instead of errors, think about the normal distribution as a data-generating mechanism:
Replace the normal distribution as the data-generating mechanism with another probability distribution, but which one?
Leads us to:
See handout for probability rules and distributions!
Sample space = the set of all possible outcomes that could occur.
Discrete variables:
Continuous variables (range of possible values)
The probability of event A, P(A), is the long run frequency or proportion of times the event occurs.
A random variable is a numeric quantity (or numerical event) that changes from trial to trial in a random process.
It is essentially a mapping that takes us from random events to numbers.
A random variable is discrete if it can take on a finite (or countably infinite1) set of possible values.
A random variable is continuous if it has values within some interval.
A probability mass function, \(p(x)\) assigns a probability to each value of a discrete random variable, X.
Example: X = number of heads in two coin flips
Possible events: {HH, TH, HT, TT} (all equally likely)
Sample space of X = {0, 1, 2}
Note: for any probability mass function \(\sum p(x) = 1\)
For continuous variables, we define probabilities as areas under a curve, e.g., P(a \(\le X \le\) b):
Probability density function, \(f(X)\)
Cumulative distribution function, \(F(X) = P(X \le x)= \int_{-\infty}^{x}f(x)dx\)
The mean for a discrete random variable with probability function, p(x), is given by:
\[E[x] = \sum xp(x)\]
Example: Calculate \(E[x]\), where \(X\) = sum of two dice
The variance for a discrete random variable with probability function, p(x), and mean \(E[x]\) is given by:
\(var(x) = E(X-E(X))^2 = \sum(x-E[x])^2p(x) = E[x^2]-(E[x])^2\)
The standard deviation is \(\sigma=\sqrt{var(x)}\)
Replace sums with integrals!
Mean: \(E[x] = \mu = \int_{-\infty}^{\infty}xf(x)dx\)
Variance: \(\int_{-\infty}^{\infty}(x-\mu)^2f(x)dx\)
\[f(x) = \frac{1}{\sqrt{2\pi}\sigma}\exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)\]
Parameters:
Characteristics:
\(X\) can take on any value (i.e., the range goes from \(-\infty\) to \(\infty\)) …
R normal functions: dnorm, pnorm, qnorm, rnorm.
JAGS: dnorm
For each probability distribution in R, there are 4 basic probability functions, starting with either - d, p, q, or r:
d is for “density” and returns the value of f(x) - probability density function (continuous distributions) - probability mass function (discrete distributions).
p is for “probability”; returns a value of F(x), cumulative distribution function.
q is for “quantile”; returns a value from the inverse of F(X); also know as the quantile function.
r is for “random”; generates a random value from the given distribution.
Use this graph, and R help functions if necessary, to complete Exercise 9.1 in the companion book.
Other notes:
CLT: if we sum a lot of independent things, then we get a normal distribution.
If we multiply a lot of independent things, we get a log-normal distribution, since:
\[log(X_1X_2\cdots X_n) = log(X_1)+log(X_2)+\ldots log(X_n)\]
Possible examples in biology? Population dynamic models
Explore briefly in R:
Compare to the expressions for the mean and variance as a function of (\(\mu, \sigma\)):
\(X \sim\) Bernouli(\(p\))
\[f(x) = P(X = x) = p^X(1-p)^{1-x}\]
Discrete random variable with two possible outcomes
A binomial random variable counts the the number of “successes” (any outcome of interest) in a sequence of trials where
Formally, a binomial random variable arises from a sum of independent Bernoulli random variables, each with parameter, \(p\):
\[Y = X_1+X_2+\ldots X_n\]
size = \(n\) and prob = \(p\) when using these functions.Examples:
YAHTZEE! Count the number of sixes in five dice rolls
On each roll:
\(X\) = number of S’s in 5 trials:
P(X = 5)?
= P(SSSSS) = P(S)P(S)P(S)P(S)P(S) =\({\frac{1}{6}}^5 = 0.00013\)
P(X = 0)
\(= {P(F)}^5 = {\frac{5}{6}}^5 = 0.4019\)
\(X\) = number of S’s in 5 trials:
P(X = 1)
= P(SFFFF) + P(FSFFF) + P(FFSFF) + P(FFFSF) + P(FFFFS)
= \(5{\frac{1}{6}}^{1}{\frac{5}{6}}^{4} = 0.419\)
For a binomial random variable with n trials and probability of success p on each trial, the probability of exactly k successes in the n trials is:
\(P(x = k) ={n \choose k}p^k(1-p)^{n-k}\)
\({n \choose k} = \frac{n!}{k!(n-k)!}\) with \(n!\) = \(n(n-1)(n-2) \cdots (2)1\)
Calculate P(X = 3) in the YAHTZEE example (n = 5, p = 1/6)
\(= {5 \choose 3}{\frac{1}{6}}^{3}{\frac{5}{6}}^2 = \frac{5\cdot4\cdot3\cdot2\cdot1}{(3\cdot2\cdot1)(2\cdot1)}\frac{1}{216}\frac{25}{36} = 0.0322\)
Raymond Felton’s free throw percentage during the 2004-2005 season at North Carolina was 70%. If we assume successive attempts are independent, what is the probability that he would hit at least 4 out of 6 free throws in 2005 Championship Game (he hit 5)?
\(P(X \ge 4) = P(X=4) + P(X=5) + P(X=6)\)
\(= {6 \choose 4}0.7^{4}0.3^2 + {6 \choose 5}0.7^{5}0.3^1 + 0.7^{6}\)
\[X \sim Multinomial(n, p_1, p_2, \ldots, p_k)\]
\(X = (x_1, x_2, \ldots, x_k)\) a multivariate random variable recording the number of events in each category
If \((n_1, n_2, \ldots, n_k)\) is the observed number of events in each category, then:
\(P((x_1, x_2, \ldots, x_k) = (n_1, n_2, \ldots, n_k)) = \frac{n!}{n_1!n_2! \cdots n_k!}p_1^{n_1}p_2^{n_2}\cdots p_k^{n_k}\)
Let \(N_t\) = number of events occurring in a time interval of length \(t\). What is the probability of observing \(k\) events in this interval?
\[P(N_t = k) = \frac{\exp(-\lambda t)(\lambda t)^k}{k!}\]
Events in 2-D space, if events occur at a constant rate, the probability of observing \(k\) events in an area of size \(A\):
\[P(N_A = k) = \frac{\exp(-\lambda A)(\lambda A)^k}{k!}\]
If \(A\) or \(t\) is constant:
\[P(N = k) = \frac{\exp(-\lambda )(\lambda)^k}{k!}\]
lambda.Examples:
Suppose a certain region of California experiences about 5 earthquakes a year. Assume occurrences follow a Poisson distribution. What is the probability of 3 earthquakes in a given year?
Number of failures until you get your first success.
\[f(x) = P(X = x) = (1-p)^xp\]
\(X_r\) = Number of failures, \(x\), before you get \(r\) successes; \(X_r \sim\) NegBinom(\(p\))
\(P(X = x) = {x+r-1 \choose x}p^{r-1}(1-p)^xp\)
or
\(P(X = x) = {x+r-1 \choose x}p^{r}(1-p)^x\)
Express \(p\) in terms of mean, \(\mu\) and \(r\):
\[\mu = \frac{r(1-p)}{p} \Rightarrow p = \frac{r}{\mu+r} \text{ and }\]
\[1-p = \frac{\mu}{\mu+r}\]
Plugging these values in to \(f(x)\) and changing \(r\) to \(\theta\), we get:
\(P(X = x) = {x+\theta-1 \choose x}\left(\frac{\theta}{\mu+\theta}\right)^{\theta}\left(\frac{\mu}{\mu+\theta}\right)^x\)
Then, let \(\theta\) = dispersion parameter take on any positive number (not just integers as in the original parameterization)
\(P(X = x) = {x+\theta-1 \choose x}\left(\frac{\theta}{\mu+\theta}\right)^{\theta}\left(\frac{\mu}{\mu+\theta}\right)^x\)
prob = p, size = \(n\)) or (mu = \(\mu\), size = \(\theta\))Overdispersed relative to Poisson (Var(x)/E[x] = 1 + \(\frac{\mu}{\theta}\)) versus 1 for Poisson
Poisson is a limiting case (when \(\theta \rightarrow \infty\))
Its appeal for use as a probability generating mechanism in ecology includes the following.
If: \(X_i \sim\) Poisson(\(\lambda_i\)), with \(\lambda_i \sim\) Gamma(\(\alpha,\beta\)), then \(X_i\) has a negative binomial distribution.
If observations are equally likely within an interval (A,B):
\[f(x) = \frac{1}{b-a}\]
\[f(x) = \frac{1}{\Gamma(\alpha)}x^{\alpha-1}\beta^\alpha\exp(-\beta x)\]
\[f(x) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1}\]
\[f(x) = \lambda \exp(-\lambda x)\]
How do we choose an appropriate distribution for our data? (Zuur et al. ch 8.7.1):
For a diagram showing links between distributions, see:
Diagram of distribution relationships
See handout with distributions (note that some can be written in multiple ways):
For example, gamma:
\[f(x) = \frac{1}{\Gamma(\alpha)}x^{\alpha-1}\beta^\alpha \exp(-\beta x)\]
\[f(x) = \frac{1}{\Gamma(\alpha)\beta^\alpha}x^{\alpha-1}\exp(-x/\beta)\]